Unveiling genetic signatures associated with resilience to neonatal diarrhea in lambs through two GWAS approaches

Neonatal diarrhea presents a significant global challenge due to its multifactorial etiology, resulting in high morbidity and mortality rates, and substantial economic losses. While molecular-level studies on genetic resilience/susceptibility to neonatal diarrhea in farm animals are scarce, prior observations indicate promising research directions. Thus, the present study utilizes two genome-wide association approaches, pKWmEB and MLM, to explore potential links between genetic variations in innate immunity and neonatal diarrhea in Karacabey Merino lambs. Analyzing 707 lambs, including 180 cases and 527 controls, revealed an overall prevalence rate of 25.5%. The pKWmEB analysis identified 13 significant SNPs exceeding the threshold of ≥ LOD 3. Moreover, MLM detected one SNP (s61781.1) in the SLC22A8 gene (p-value, 1.85eE-7), which was co-detected by both methods. A McNemar’s test was conducted as the final assessment to identify whether there are any major effective markers among the detected SNPs. Results indicate that four markers—oar3_OAR1_122352257, OAR17_77709936.1, oar3_OAR18_17278638, and s61781.1—have a substantial impact on neonatal diarrhea prevalence (odds ratio: 2.03 to 3.10; statistical power: 0.88 to 0.99). Therefore, we propose the annotated genes harboring three of the associated markers, TIAM1, YDJC, and SLC22A8, as candidate major genes for selective breeding against neonatal diarrhea.


Results
The overall prevalence of diarrhea in neonatal lambs was calculated to be 25.5% (180 cases, 527 controls, with a distribution of 24.1% in males and 26.7% in females.No significant differences were found between males and females (Chi-square, 0.613; exact p-value, 0.434).
After quality controls (QC), 36,529 SNPs and 707 lambs remained for further analysis.The multi-locus approach (pKWmEB) identified 13 SNPs meeting the threshold of ≥ LOD 3 (Table 1; Fig. 1a).Within a 300 KB ± proximity of the associated SNPs, 11 known and two novel candidate genes (ENSOARG00000016287, and ENSOARG00020003584) were detected.Seven of the associated SNPs were located within the relevant gene.
The classical MLM approach identified one SNP with an exact p-value of 1.85e-7 (Table 2, Fig. 1b).The SNP, s61781.1 in the SLC22A8 gene, was co-detected by both methods.
The quantile-quantile (QQ) plot showed no significant evidence of inflation or deflation factors in the test statistics.(λ = 1.03).This confirms that potential population stratification and/or cryptic relatedness were effectively controlled during the GWA studies (Fig. 2).
Gene enrichment and network analysis using the genes which are 100 KB ± distance to relevant SNP showed that identified candidate genes play a role within B cell receptor (BCR) signaling, the complement and coagulation cascade, the hematopoietic cell lineage, and the bile secretion pathways (Fig. 3).
In McNemar's test, four SNPs (rs424708132, rs422584795, rs416409832, and rs404606866) met the statistical power critical threshold, i.e., power ≥ 0.80, and p-value ≤ 0.05.The achieved statistical powers were observed to range between 0.88 and 0.99.Additionally, it was noted that the odds ratios (OR) for these SNPs ranged from 2.03 to 3.10.Among these four SNPs, the first two were observed to contribute to genetic susceptibility to neonatal diarrhea, while the last two were found to confer genetic resilience.All of these SNPs with major effects were affecting the occurrence of neonatal diarrhea in the dominant model (Table 3).

Discussion
The report described here presents the genetic background of neonatal diarrhea in farm animals for the first time, utilizing the sheep paradigm.Using two GWA approaches (single-locus MLM and multi-locus pKW-mEB), a total of 13 markers were identified affecting the prevalence of neonatal diarrhea with small effects ranking from 1.09% to 3.49% (Table 2).Among these 13 markers, s61781.1 (rs404606866) was co-detected by The proportion of phenotypic variance explained by significant SNP(%). 2 Gene annotation was performed using ARS-UI_Ramb_v2.0and Oar_v3.1 assemblies.www.nature.com/scientificreports/both methods with genome-wide significance.After conducting a McNemar's test for correlated proportions to investigate whether there are any markers with major effects among them, it was determined that four markers exhibit significant influence, with odds ratios ranging from 2.03 to 3.1 (statistical power ranging from 0.88 to 0.99).Once again, the marker s61781.1 (rs404606866) surpassed the third test.Three of these markers, namely oar3_OAR1_122352257, OAR17_77709936.1,and s61781.1,were found within intron regions of protein-coding genes (with rs IDs rs424708132, rs422584795, and rs404606866, respectively).The annotated genes for these three markers were identified as TIAM1, YDJC, and SLC22A8.

Marker
In the Sheep Breeding and Research Institute, an official organization affiliated with the Turkish Ministry of Agriculture and Forestry, which has an 80 year history in sheep farming and although it boasts considerable professional resources and personnel, the prevalence of neonatal diarrhea has been observed to reach concerning levels, such as 25.5%.As far as the authors are aware, no study has investigated the genetic factors contributing to resilience or susceptibility to neonatal diarrheas among farm animals at the molecular level.Nonetheless, there are noteworthy observations suggesting that despite harboring pathogens, many neonatal animals remain asymptomatic of diarrhea.For instance, in their case-control examination, Caffarena et al. 31 found that while their pen mates developed diarrhea (case groups; n = 264), the control groups (n = 271) consisted of non-diarrheic calves testing positive for Cryptosporidium spp.(26.9%), bovine astrovirus (22.4%),Rotavirus (11.1%),Salmonella enterica (2.6%), or Escherichia coli (1.8%).In another investigation, Abdou et al. 30 discovered that the prevalence of Group A rotaviruses in non-diarrheic cattle, sheep, and goats was 2.9, 1.3, and 2.8%, respectively.Zhong et al. 9 provided a possibly more dramatic example when they showed that the relative prevalence of potentially    www.nature.com/scientificreports/pathogenic bacteria, like Bacteroides, in healthy lambs was more than double that in diarrhoeic pen mates using 16S rRNA sequencing of the gut microbiota.These findings support the authors' conclusions, highlighting that variations in individuals' genetic backgrounds concerning innate immunity, coupled with other environmental factors like climatic conditions, management practices, and colostrum intake, significantly influence disease outcomes.
The present study delves into the intricate genetic landscape that underpins immunity, with a focus on key genes identified through dual GWAS.The SLCO2A1 gene encodes a crucial prostaglandin transporter essential for the cellular uptake of Prostaglandins (PGs) 35 .These PGs play pivotal roles in chronic inflammation by amplifying cytokine effects on inflammatory cells and regulating gene expression, thereby promoting autoimmune inflammation through the differentiation and proliferation of Th1 and Th17 cells 36 .Additionally, SLCO2A1 facilitates prostaglandin E2 (PGE2) secretion by macrophages, modulating neutrophil clearance from inflammatory sites 37 .
The TIAM family proteins, encompassing TIAM1 and TIAM2/STEF, function as Rac1-specific Guanine Nucleotide Exchange Factors (GEFs), holding pivotal significance across diverse cell types, including epithelial, neuronal, and immune cells.Termed Tiam GEFs, these proteins intricately regulate essential cellular processes such as migration, proliferation, and survival by initiating and guiding Rac1 signaling pathways.Perturbations in Tiam GEFs are intricately associated with a broad spectrum of human ailments, ranging from cancer to immunological and neurological disorders 38 .TIAM1 plays a role in the Wnt signaling pathway, engaging with proteins such as TAZ/YAP to govern gene expression and cellular functions 39 .Furthermore, TIAM1 participates in glucose-stimulated insulin secretion within pancreatic β-cells, underscoring its involvement in metabolic activities 40 .
The N-acetylated alpha-linked acidic dipeptidase-like 2 (NAALADL2) gene is implicated in various molecular functions and diseases, including intestinal Behçet's disease 41 , inflammation, apoptosis, and cardiovascular pathology in Kawasaki disease 42 .Moreover, NAALADL2 plays a role in Epstein-Barr virus infection, which is associated with disorders such as multiple sclerosis and lupus 43 .
The GOLM2 gene, also referred to as CASC4 and identified as a potential cancer susceptibility gene 44 , has been linked to reduced overall survival rates in ovarian cancer 45 .Recent studies conducted during the pandemic have highlighted the GOLM gene family, including GOLM1 and GOLM2, identifying them as potential causal plasma proteins associated with SARS-CoV-2 and suggesting their involvement in increased susceptibility to COVID-19 infection 46,47.The SDK2 gene is associated with circulating serum soluble gp130 (sgp130), an antagonist of the inflammatory response in atherosclerosis mediated by interleukin 6 48 .The role of SDK2 in immunity is indicated by its altered stage-specific expression during HIV-1 infection 49 .Additionally, it has been demonstrated that SDK2 is differentially expressed upon experimental infestation by the protozoan Amyloodinium ocellatum in European seabass tissues 50 .
Complement receptor (CR) type 2 (CR2/CD21) exhibits expression primarily during both the immature and mature stages of B cell maturation.In conjunction with CD19, CR2 plays a pivotal role in enhancing mature B cell responses to foreign antigens 51 .Research indicates that deficiency in CR2 could alter antibody production and lead to the deposition of immune complexes, thereby influencing the humoral immune response 52 .Moreover, CR2 has been linked to the recognition of foreign DNA during host-immune responses, highlighting its role in immune surveillance and pathogen response 53 .
The CEP350 gene has been implicated in the transcriptional response of macrophages triggered by Mycobacterium tuberculosis 54 and displays distinct expression patterns in erythema nodosum leprosum 55 .Additionally, CEP350 proteins interact with viral proteins of SARS-CoV-2 and exhibit altered expression upon infection 56 .
CTNND2 undergoes downregulation in the later stages of Cytomegalovirus infection in humans 57 .Notably, both CEP350 and CTNND2 genes are similarly implicated in the macrophage transcriptional response to Mycobacterium tuberculosis 54 .
The YDJC gene plays a pivotal role in autoimmune diseases by interacting with UBE2L3 promoters and co-bound transcription factors 58 .Furthermore, the involvement of the YDJC gene has been observed in other diseases such as Coeliac Disease 59 , idiopathic inflammatory myopathies 60 , and autophagy-dependent intracellular pathogen defense 61 .
The significant role of the SLC superfamily in drug transportation is widely recognized.As a member of this family, SLC22A8 encodes the protein OAT3, influx transporter protein, which primarily facilitates the uptake of substrates into cells 62 .The SLC22A8 gene is also implicated in various biological processes, including survival and immune infiltration in clear cell renal cell carcinoma (ccRCC), a prevalent form of renal malignancy worldwide 63 .Additionally, prior evidence suggests a link between acute neuroinflammation and alterations in the levels of prostaglandin E2, a substrate of SLC22A8.This association is accompanied by changes in SLC22A8 levels 64 .
Network analysis using the KEGG database identified four main pathways.Among these, the B cell receptor (BCR) signaling pathway, the complement and coagulation cascade pathways, and the hematopoietic cell lineage pathway emerged as directly associated with immunity, playing critical roles in immune cell activation, regulation, and differentiation.
The B cell receptor (BCR) signaling pathway serves as a pivotal mechanism in orchestrating the immune response upon antigen encounter.Upon binding of antigens to the BCR, a cascade of events is initiated, beginning with the clustering of BCRs and culminating in the activation of genes crucial for B-cell function and differentiation 65 .This pathway is characterized by the rapid phosphorylation of key molecules such as Igα/β and intracellular signaling proteins like lyn, syk, and BLNK 66 .Such signaling triggers essential processes including B-cell proliferation, differentiation, and antibody production.Co-stimulatory signals provided by molecules like CD40 further potentiate the response, particularly in the proliferation of antigen-stimulated B cells.Regulation of the BCR signaling pathway is finely tuned, with molecules such as CD22 exerting inhibitory effects on IgG1 B-cell receptor signaling, thereby impacting B-cell development 67 .Continuous signaling through CD22 is crucial for the normal development and survival of B cells within lymphoid tissues (Akatsu et al., 2022).Additionally, B cells play a pivotal role in antigen presentation to CD4 + T cells, facilitating B-cell help and subsequent differentiation into memory or plasma cells.Furthermore, this pathway is implicated in the production of IL-10 by B-1 cells, thereby contributing to the regulation of immune responses 68 .
Complement, an integral aspect of the innate and acquired immune system, operates through a series of proteolytic cascades initiated upon encountering microorganisms 69 .Activation of complement triggers potent proteolytic cascades, leading to opsonization, pathogen lysis, and the induction of inflammatory responses via proinflammatory molecular production 70 .However, dysregulation of the complement system can occur in various diseases, causing the tightly controlled proteolytic cascade to become harmful 71 .The complement system serves a pivotal role in innate immunity, bridging it with acquired immunity 72 .It defends against foreign pathogens by generating complement fragments that facilitate opsonization, chemotaxis, leukocyte activation, and cytolysis 73 .Blood coagulation, another intricate cascade, involves coagulation proteins as its core components, orchestrating a series of reactions that culminate in the conversion of soluble fibrinogen to insoluble fibrin clots.At the heart of this process lies thrombin, which binds to fibrin during clot formation, thereby enhancing clot strength and stability 74,75 .Protease-activated receptors, such as those activated by thrombin, belong to the family of G proteincoupled receptors, serving as mediators of innate immunity responses 76 .Moreover, the kallikrein-kinin system represents an endogenous metabolic cascade, with its activation leading to the release of vasoactive kinins, including bradykinin-related peptides.This intricate system involves precursors known as kininogens and primarily tissue and plasma kallikreins.The pharmacologically active kinins, often regarded as either proinflammatory or cardioprotective, are implicated in numerous physiological and pathological processes 77 .
The hematopoietic cell lineage pathway governs the intricate process of blood cell development, beginning with hematopoietic stem cells (HSCs) endowed with the capacity for self-renewal or differentiation into two primary progenitors: common lymphoid progenitors (CLPs) and common myeloid progenitors (CMPs).CLPs give rise to lymphoid lineage cells, including natural killer (NK) cells, T lymphocytes, and B lymphocytes, while CMPs differentiate into myeloid lineage cells, comprising various leukocytes, erythrocytes (red blood cells), and megakaryocytes responsible for platelet production, crucial for hemostasis.Differentiating cells express distinct surface markers indicative of their stage and lineage, facilitating their identification and characterization 78,79 .
In addition, network analysis revealed the bile secretion pathway that could have an indirect association with immunity.The bile secretion pathway orchestrates a sophisticated interplay of molecular determinants and signaling cascades crucial for maintaining liver function and whole-body homeostasis.Regulation of bile formation and secretion involves a complex network of factors, both in animal models and humans 80 .Central to this process are the membrane transport systems within hepatocytes and cholangiocytes, along with the structural integrity of the biliary tree.Hepatocytes, comprising the majority of liver cells, are responsible for generating primary bile within their canaliculi 81 .Moreover, beyond their traditional role in aiding lipid digestion, bile acids (BAs) function as vital signaling molecules, influencing lipid and glucose metabolism and modulating the gut microbiota composition 82 .Hence, one could speculate that bile formation and/or the composition of gut microbiota influenced by bile secretion may impact the pathogenesis of intestinal pathogens responsible for neonatal diarrhea.
Notably, annotated genes in the study highlight their potential role in immune response modulation and the pathogenesis of various diseases.For example, annotated genes are implicated in chronic inflammation, amplifying cytokine effects on inflammatory cells, inflammation and apoptosis, susceptibility to COVID-19, stagespecific expression during HIV-1 infection, B cell maturation, B cell responses to foreign antigens, recognition of foreign DNA, transcriptional response of macrophages triggered by Mycobacterium tuberculosis.Similarly, the identified pathways are somehow linked to the immune system.Taken together, the findings underscore that most of the identified SNPs and annotated genes are directly or indirectly involved in the host immune response, suggesting potential roles in immune response to neonatal diarrhea.
In conclusion, our findings draw attention to the significant challenge posed by the prevalence of neonatal diarrhea and its economic implications.Addressing neonatal diarrhea in sheep farming is crucial not only for the well-being of the animals and the sustainability of the industry but also for public health and food safety considerations.This study reveals a subtle connection between neonatal diarrhea and genes intricately involved in immune system regulation.
Furthermore, the results of the final statistical analysis, the McNemar's test, have revealed that four SNPs have a major effect on neonatal diarrhea with ≥ 0.88 statistical power (p ≤ 0.05).One of these SNPs is located in an intergenic region, while the other three are found in the intron regions of protein-coding genes.Therefore, we propose that these identified four SNPs, oar3_OAR1_122352257 (risk OR 3.10), OAR17_77709936.1 (risk OR 2.03), oar3_OAR18_17278638 (protective OR 2.11), and s61781.1 (protective OR 2.7), are candidate major effective markers, and the annotated three genes, TIAM1, YDJC, and SLC22A8, are candidate major genes for neonatal diarrhea in lambs.Thus, conducting selective breeding using risk markers for negative selection and protective markers for positive selection could result in a cumulative OR of 9.94 and potentially improve the herd's innate immunity against neonatal diarrhea.

Method Animals
The animal material utilized in the present study consisted of male (n = 332) and female (n = 375) Karacabey Merino lambs reared at the Sheep Breeding and Research Institute (SBRI).Karacabey Merino, acknowledged as one of Turkey's most popular commercial sheep breeds, underwent improvement efforts in the 1940s by strategically backcrossing German Mutton Merino rams with native Kivircik ewes.After these breeding efforts, the Karacabey Merino population has maintained its genetic integrity without backcrossing for over four decades.
The SBRI, functioning as the primary breeding center, has played a pivotal role in distributing Karacabey Merino sheep across Turkey since the 1940s.
Official veterinary records from the SBRI were obtained.There were 707 records for neonatal lamb which 180 of them experienced diarrhea in their neonatal life.Specifically, all the lambs were selected from two adjacent shelters allocated for males and females and subjected to uniform management practices.Thus, each lamb chosen for the study was considered to have an equal risk of infection, accounting for both the intensity and duration of exposure to pathogens.

Genotyping and quality controls
The animals were subjected to genotyping using the OVINE 50 K BeadChip on the ILLUMINA platform at a private laboratory, resulting in an overall genotyping rate of 0.995.Initially, genotype data pertaining to sex chromosomes were excluded from the further analysis.Subsequently, quality control procedures were carried out using Plink 1.07 83 .The applied quality control (QC) parameters included the following criteria: Minor Allele Frequency (MAF) > 0.05, animal missing genotype rates (mind) < 0.1, and adherence to the Hardy-Weinberg equilibrium threshold > 0.00001.Upon the completion of the QC process, a total of 36,529 SNPs and 707 lambs satisfied the predefined threshold criteria.

Association study
The GWA analyses were conducted using two approaches: the conventional single-locus MLM and the recently developed multi-locus pKWmEB.For both tests, the same parameters were used.Binary-formatted traits (diarrheic and non-diarrheic, or case-control) were derived from official veterinary records and used as traits for GWAS.Gender and birth type (single, twin, or triplet) were considered mixed effects.The quality of colostrum taken by the lamb is associated with the age of the mother and is among the major factors contributing to the onset of diarrhea.Therefore, the maternal age was also considered as a fixed effect.Lastly, to address population stratification, Principal Component Analysis (PCA) was conducted using TASSEL software 84 , and the first five principal components were included in the both statistical models.
pKWmEB GWA analysis was performed using the default parameters with mrMLM, an R package 85 .Kinship matrices were generated via the mrMLM software and incorporated into the model to control for cryptic relatedness.In multi-locus GWA studies, a generally accepted threshold for the logarithm of odds (LOD) score is LOD ≥ 3 33,34,86,87 .Thus, we employed the threshold LOD ≥ 3 for statistical significance of the pKWmEB method.
The single-locus MLM model was conducted using GEMMA software 88 .Kinship matrices were generated with the GEMMA for this test.The Bonferroni correction was applied for genome-wide statistical significance.
Manhattan plots were generated separately for both pKWmEB and MLM models.QQ plots were employed to identify any inflation or deflation of test statistics resulting from systemic biases such as population stratification or cryptic relatedness.

McNemar's test
McNemar's test, which is a non-parametric method, is performed on matched pairs and assesses whether the proportions of a binary outcome differ significantly between the paired observations 89 .Due to being based on the chi-square test, and depending on sample numbers, McNemar's test is generally not sensitive enough to detect small effects of variables; but it can detect major effects.For this purpose, 180 case-control matched pairs were constructed.In genotype-phenotype association studies, the greatest advantage of McNemar's test is its ability to standardize variables such as age, gender, breed, and farm, which could directly influence the results.Thefore, matching was perform based on gender, type of birth, and maternal age.Briefly, each case was matched with a control of the same gender, the same type of birth, and the same maternal age.Thus, the parameters used in both GWA methods were also utilized in the McNemar's test.
Each member of a case-control pair is assigned a value of '+' or '−' depending on whether the risk/protective factor is present (+) or absent (−) and falls into one of four possible categories pair status categories: '+,+'; '+,−'; '−,+'; or '−,−' .All subsequent test statistics are calculated from these values (Table 4).In McNemar's test, only the discordant pairs (b and c) are informative.
Statistical power for each SNP under McNemar's test was calculated using G*Power v3.1.9.7 software (Universtat Düsseldorf, Germany).In genetic association studies, a minimum of 0.80 statistical power (p ≤ 0.05) is commonly employed to ensure reliable statistics 90,91 .Therefore, only the SNPs meeting a statistical power of ≥ 0.80 (p ≤ 0.05) were deemed to have a significantly effect on neonatal diarrhea in the McNemar's test.

Figure 3 .
Figure 3. Gene enrichment and network analysis of associated candidate genes.

Table 2 .
Identified neonatal diarrhea associated SNP using single-locus MLM method.

Marker ID Rs ID Beta SE MAF Allele p-value -log10 (P) Bonferroni corrected p-value Association type
Figure 2. QQ plot for detecting inflation or deflation of the GWAS results.

Table 3 .
The results of McNemar's statistics and statistical power of each test.